Picture for Asa Cooper Stickland

Asa Cooper Stickland

RepliBench: Evaluating the autonomous replication capabilities of language model agents

Add code
Apr 21, 2025
Viaarxiv icon

Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods

Add code
Nov 20, 2024
Viaarxiv icon

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Add code
Jul 22, 2024
Viaarxiv icon

Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

Add code
Jul 04, 2024
Viaarxiv icon

Steering Without Side Effects: Improving Post-Deployment Control of Language Models

Add code
Jun 21, 2024
Viaarxiv icon

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Add code
Nov 20, 2023
Viaarxiv icon

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

Add code
Sep 22, 2023
Figure 1 for The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Figure 2 for The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Figure 3 for The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Figure 4 for The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Viaarxiv icon

Taken out of context: On measuring situational awareness in LLMs

Add code
Sep 01, 2023
Viaarxiv icon

Robustification of Multilingual Language Models to Real-world Noise with Robust Contrastive Pretraining

Add code
Oct 10, 2022
Figure 1 for Robustification of Multilingual Language Models to Real-world Noise with Robust Contrastive Pretraining
Figure 2 for Robustification of Multilingual Language Models to Real-world Noise with Robust Contrastive Pretraining
Figure 3 for Robustification of Multilingual Language Models to Real-world Noise with Robust Contrastive Pretraining
Figure 4 for Robustification of Multilingual Language Models to Real-world Noise with Robust Contrastive Pretraining
Viaarxiv icon

When does Parameter-Efficient Transfer Learning Work for Machine Translation?

Add code
May 23, 2022
Figure 1 for When does Parameter-Efficient Transfer Learning Work for Machine Translation?
Figure 2 for When does Parameter-Efficient Transfer Learning Work for Machine Translation?
Figure 3 for When does Parameter-Efficient Transfer Learning Work for Machine Translation?
Figure 4 for When does Parameter-Efficient Transfer Learning Work for Machine Translation?
Viaarxiv icon